Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cephadm/services/cephadmservice: shutdown monitors before removing them #45511

Closed

Conversation

kamoltat
Copy link
Member

@kamoltat kamoltat commented Mar 17, 2022

Motivation behind this PR is from the tracker:
https://tracker.ceph.com/issues/50089

ceph orch apply mon <num> command removes
the monitor from the cluster before shutting
the monitor down has created issues in the
mon/elector.cc part of ceph. The design of
the elector class assumes that we shut the monitor
down before removing them out of the cluster.
Therefore, this should be enforced everywhere
in ceph.

Correct steps to manually removing the monitor:

https://docs.ceph.com/en/latest/rados/operations/add-or-rm-mons/#removing-a-monitor-manual

Moreover,
added a test where we have 5 MONs on
5 different hosts and try to reduce
the number of MONs from 5 to 3 using
the command ceph orch apply mon 3.

Also, increasing the number of MONs from
3 to 5 using the command:
ceph orch apply mon 5.

Evaluating the correctness of the commands
and inspecting if there are crashes or unhealthy
cluster states.

This PR is blocked by:
https://tracker.ceph.com/issues/55695 - test won't pass because there is a chance monitor(s) are not removed.

Fixes: https://tracker.ceph.com/issues/50089

Signed-off-by: Kamoltat ksirivad@redhat.com

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows

@neha-ojha
Copy link
Member

@kamoltat could you please add the same context you provided in https://tracker.ceph.com/issues/50089#note-11 to the commit message as well, so that reviewers understand the motivation behind this PR

@neha-ojha neha-ojha requested a review from adk3798 March 17, 2022 21:22
Comment on lines 599 to 600
cmd = "sudo systemctl stop ceph-%s@%s" % (fsid, service_name)
os.system(cmd)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adk3798 I don't think this will actually work, currently, I don't think there is a MON or other ceph cli command that allows us to stop the monitor, there is only service ceph -a stop mon.{mon-id} or using systemctl. These are all executed through cmd and not ceph cli, do you mind guiding me to how I can execute cmd commands inside this part of cephadm? Thanks!

Copy link
Contributor

@adk3798 adk3798 Mar 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd think a change like this to the way we shutdown the daemon belongs more in https://github.com/ceph/ceph/blob/master/src/cephadm/cephadm#L6446 than here in the mgr module. All daemon removals run through that code path. Although it looks like it already run systemctl stop. Either way, that's where you'd want to look.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks! will take a look

@kamoltat
Copy link
Member Author

@neha-ojha sure thing! sorry sometimes I like to open PRs and change the descriptions and etc after

@kamoltat kamoltat changed the title cephadm/services/cephadmservice: shutdown before remove cephadm/services/cephadmservice: shutdown monitors before removing them Mar 17, 2022
@kamoltat kamoltat force-pushed the wip-ksirivad-fix-ceph-orch-apply-mon branch from 17d4ba5 to 9e4f66c Compare March 17, 2022 21:43
@kamoltat kamoltat self-assigned this Mar 17, 2022
@kamoltat kamoltat force-pushed the wip-ksirivad-fix-ceph-orch-apply-mon branch 3 times, most recently from 84c608d to 3b240b7 Compare March 19, 2022 20:58

# remove the mon from cluster after we stop the mon.
if daemon_type == 'mon':
call(ctx, ['ceph', 'mon', 'rm', daemon_id],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can I get some clarification on what the exact order of operations we need is? We have a few things going on

  • Stopping the mon (right above this)
  • removing the mon from the quorum (this command here)
  • disabling and fully removing the mon (next few lines after this)

What are the requirements in terms of ordering here? If it's just that we need to stop the mon before removing it from the quorum, you could consider just moving the mon rm command you removed from pre-remove into post-remove which will run after rm_daemon (thus making it the last of the 3 operations to happen). If it is necessary to have this exact order where we stop the mon, then remove from quorum, then disable and fully remove the mon, you could leave it here. However, this would need to be done as a shell command (shell command basically starts up a container to run the command in) rather than just calling call (see https://github.com/ceph/ceph/blob/master/src/cephadm/cephadm#L3640) as there is no guarantee that the ceph packages are installed directly on the host (which is where call will run). This could be a bit tricky to setup correctly, which is why I want to verify that we NEED this to be done in the middle here and not just after the the mon is stopped.

@kamoltat
Copy link
Member Author

1/1 passed, evaluated by manually checking the monitor leader log to see if there are no crashes
https://pulpito.ceph.com/ksirivad-2022-03-16_18:17:21-orch:cephadm-wip-ksirivad-recreate-50089-distro-basic-smithi/6739243/

@kamoltat kamoltat force-pushed the wip-ksirivad-fix-ceph-orch-apply-mon branch from 3b240b7 to 08feede Compare March 29, 2022 21:44
@github-actions github-actions bot added the tests label Mar 29, 2022
@adk3798 adk3798 force-pushed the wip-ksirivad-fix-ceph-orch-apply-mon branch from 4f4fcbe to 587d8bf Compare April 11, 2022 17:04
@kamoltat kamoltat force-pushed the wip-ksirivad-fix-ceph-orch-apply-mon branch 2 times, most recently from d800aa1 to e6a68d7 Compare April 25, 2022 17:55
@kamoltat
Copy link
Member Author

kamoltat commented Apr 25, 2022

@adk3798 Forgot to rebase in my local and forced-push to here, so had to copy-paste your work into my new commit, just letting you know in case you wonder what happened (sorry, didn't mean to steal your work lol).

@kamoltat
Copy link
Member Author

Make check fail, need 2 blank lines :(

cephadm:6724:1: E302 expected 2 blank lines, found 1
1     E302 expected 2 blank lines, found 1
ERROR: InvocationError for command /home/jenkins-build/build/workspace/ceph-pull-requests/src/cephadm/.tox/flake8/bin/flake8 --config=tox.ini cephadm (exited with code 1)
___________________________________ summary ____________________________________
  py3: commands succeeded
  mypy: commands succeeded
ERROR:   flake8: commands failed

@kamoltat kamoltat force-pushed the wip-ksirivad-fix-ceph-orch-apply-mon branch from e6a68d7 to 96401ed Compare April 25, 2022 20:41
@kamoltat
Copy link
Member Author

@kamoltat kamoltat force-pushed the wip-ksirivad-fix-ceph-orch-apply-mon branch 2 times, most recently from cc637e0 to f9f59c2 Compare April 26, 2022 18:38
@kamoltat kamoltat force-pushed the wip-ksirivad-fix-ceph-orch-apply-mon branch from f9f59c2 to 493f8b3 Compare May 17, 2022 23:27
@adk3798
Copy link
Contributor

adk3798 commented Jul 25, 2022

https://pulpito.ceph.com/adking-2022-07-22_01:56:03-orch:cephadm-wip-adk3-testing-2022-07-21-1852-distro-default-smithi/

1 infra failure (the dead job), 1 issue with test_repos that broke a few days ago and is unrelated to PRs in the run

# remove mon from quorum before we destroy the daemon
logger.info('Removing monitor %s from monmap...' % daemon_id)
ret, out, err = self.mgr.check_mon_command({
'prefix': 'mon rm',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still getting health warnings whenever I scale down mons at all with this patch.

[ceph: root@vm-00 /]# ceph orch ls 
NAME                       PORTS        RUNNING  REFRESHED  AGE  PLACEMENT  
alertmanager               ?:9093,9094      1/1  47s ago    6m   count:1    
crash                                       3/3  49s ago    6m   *          
grafana                    ?:3000           1/1  47s ago    6m   count:1    
mgr                                         2/2  49s ago    7m   count:2    
mon                                         3/5  49s ago    7m   count:5    
node-exporter              ?:9100           3/3  49s ago    6m   *          
osd.all-available-devices                    12  49s ago    4m   *          
prometheus                 ?:9095           1/1  49s ago    6m   count:1    
[ceph: root@vm-00 /]# ceph orch ps
NAME                 HOST   PORTS        STATUS          REFRESHED   AGE  MEM USE  MEM LIM  VERSION                 IMAGE ID      CONTAINER ID  
alertmanager.vm-00   vm-00  *:9093,9094  running (70s)     51s ago    6m    15.9M        -                          ba2b418f427c  9f6e62f1c9d1  
crash.vm-00          vm-00               running (6m)      51s ago    6m    6979k        -  17.0.0-13734-g96abb011  f4b280ba1496  966c1dfc7d74  
crash.vm-01          vm-01               running (4m)      53s ago    4m    7159k        -  17.0.0-13734-g96abb011  f4b280ba1496  c6b54137989f  
crash.vm-02          vm-02               running (4m)      53s ago    4m    7180k        -  17.0.0-13734-g96abb011  f4b280ba1496  b9d9d21e028e  
grafana.vm-00        vm-00  *:3000       running (66s)     51s ago    6m    40.8M        -  8.3.5                   dad864ee21e9  3b91c5d4cc78  
mgr.vm-00.bdwsut     vm-00  *:9283       running (7m)      51s ago    7m     479M        -  17.0.0-13734-g96abb011  f4b280ba1496  06aec7958775  
mgr.vm-02.haworo     vm-02  *:8443,9283  running (4m)      53s ago    4m     428M        -  17.0.0-13734-g96abb011  f4b280ba1496  5fd8cceb7c0d  
mon.vm-00            vm-00               running (7m)      51s ago    7m    56.5M    2048M  17.0.0-13734-g96abb011  f4b280ba1496  7d09db9b0721  
mon.vm-01            vm-01               running (4m)      53s ago    4m    48.0M    2048M  17.0.0-13734-g96abb011  f4b280ba1496  ea18ee80f985  
mon.vm-02            vm-02               running (4m)      53s ago    4m    52.6M    2048M  17.0.0-13734-g96abb011  f4b280ba1496  30934b1e44e1  
node-exporter.vm-00  vm-00  *:9100       running (6m)      51s ago    6m    19.5M        -                          1dbe0e931976  ebdb46f6cffe  
node-exporter.vm-01  vm-01  *:9100       running (4m)      53s ago    4m    24.0M        -                          1dbe0e931976  2bc17a3379c4  
node-exporter.vm-02  vm-02  *:9100       running (4m)      53s ago    4m    22.8M        -                          1dbe0e931976  1838354f8634  
osd.0                vm-01               running (3m)      53s ago    3m    71.6M    6203M  17.0.0-13734-g96abb011  f4b280ba1496  45d1ec388ca2  
osd.1                vm-02               running (3m)      53s ago    3m    71.9M    5435M  17.0.0-13734-g96abb011  f4b280ba1496  e8829f51624e  
osd.2                vm-00               running (2m)      51s ago    2m    68.1M    4923M  17.0.0-13734-g96abb011  f4b280ba1496  6340cd1be0d5  
osd.3                vm-01               running (2m)      53s ago    2m    70.8M    6203M  17.0.0-13734-g96abb011  f4b280ba1496  ad77363d8a7f  
osd.4                vm-02               running (2m)      53s ago    2m    68.7M    5435M  17.0.0-13734-g96abb011  f4b280ba1496  9219b47952b9  
osd.5                vm-00               running (2m)      51s ago    2m    70.2M    4923M  17.0.0-13734-g96abb011  f4b280ba1496  14d5661c5bab  
osd.6                vm-02               running (110s)    53s ago    2m    68.0M    5435M  17.0.0-13734-g96abb011  f4b280ba1496  427652c89872  
osd.7                vm-02               running (88s)     53s ago  100s    69.0M    5435M  17.0.0-13734-g96abb011  f4b280ba1496  f41b405ce4ca  
osd.8                vm-01               running (2m)      53s ago    2m    68.0M    6203M  17.0.0-13734-g96abb011  f4b280ba1496  3662aca0f34a  
osd.9                vm-00               running (2m)      51s ago    2m    67.4M    4923M  17.0.0-13734-g96abb011  f4b280ba1496  a3c94dc82ebf  
osd.10               vm-01               running (2m)      53s ago    2m    67.2M    6203M  17.0.0-13734-g96abb011  f4b280ba1496  30a0e569b975  
osd.11               vm-00               running (3m)      51s ago    3m    72.0M    4923M  17.0.0-13734-g96abb011  f4b280ba1496  fc1ab3e586c3  
prometheus.vm-01     vm-01  *:9095       running (4m)      53s ago    4m    59.5M        -                          514e6a882f6e  640fcf0a4ee7  
[ceph: root@vm-00 /]# ceph health detail
HEALTH_OK
[ceph: root@vm-00 /]# ceph orch apply mon 2
Scheduled mon update...
[ceph: root@vm-00 /]# ceph orch ls
NAME                       PORTS        RUNNING  REFRESHED  AGE   PLACEMENT  
alertmanager               ?:9093,9094      1/1  97s ago    9m    count:1    
crash                                       3/3  97s ago    9m    *          
grafana                    ?:3000           1/1  97s ago    9m    count:1    
mgr                                         2/2  97s ago    9m    count:2    
mon                                         2/2  97s ago    103s  count:2    
node-exporter              ?:9100           3/3  97s ago    9m    *          
osd.all-available-devices                    12  97s ago    7m    *          
prometheus                 ?:9095           1/1  32s ago    9m    count:1    
[ceph: root@vm-00 /]# ceph orch ps
NAME                 HOST   PORTS        STATUS        REFRESHED  AGE  MEM USE  MEM LIM  VERSION                 IMAGE ID      CONTAINER ID  
alertmanager.vm-00   vm-00  *:9093,9094  running (2m)     9s ago   7m    23.1M        -                          ba2b418f427c  9f6e62f1c9d1  
crash.vm-00          vm-00               running (7m)     9s ago   7m    6979k        -  17.0.0-13734-g96abb011  f4b280ba1496  966c1dfc7d74  
crash.vm-01          vm-01               running (5m)    11s ago   5m    7159k        -  17.0.0-13734-g96abb011  f4b280ba1496  c6b54137989f  
crash.vm-02          vm-02               running (5m)    11s ago   5m    7180k        -  17.0.0-13734-g96abb011  f4b280ba1496  b9d9d21e028e  
grafana.vm-00        vm-00  *:3000       running (2m)     9s ago   7m    44.9M        -  8.3.5                   dad864ee21e9  3b91c5d4cc78  
mgr.vm-00.bdwsut     vm-00  *:9283       running (8m)     9s ago   8m     487M        -  17.0.0-13734-g96abb011  f4b280ba1496  06aec7958775  
mgr.vm-02.haworo     vm-02  *:8443,9283  running (5m)    11s ago   5m     428M        -  17.0.0-13734-g96abb011  f4b280ba1496  5fd8cceb7c0d  
mon.vm-00            vm-00               running (8m)     9s ago   8m    59.7M    2048M  17.0.0-13734-g96abb011  f4b280ba1496  7d09db9b0721  
mon.vm-01            vm-01               running (5m)    11s ago   5m    50.7M    2048M  17.0.0-13734-g96abb011  f4b280ba1496  ea18ee80f985  
node-exporter.vm-00  vm-00  *:9100       running (7m)     9s ago   7m    19.6M        -                          1dbe0e931976  ebdb46f6cffe  
node-exporter.vm-01  vm-01  *:9100       running (5m)    11s ago   5m    21.4M        -                          1dbe0e931976  2bc17a3379c4  
node-exporter.vm-02  vm-02  *:9100       running (5m)    11s ago   5m    24.5M        -                          1dbe0e931976  1838354f8634  
osd.0                vm-01               running (4m)    11s ago   4m    72.9M    6203M  17.0.0-13734-g96abb011  f4b280ba1496  45d1ec388ca2  
osd.1                vm-02               running (4m)    11s ago   4m    73.2M    5435M  17.0.0-13734-g96abb011  f4b280ba1496  e8829f51624e  
osd.2                vm-00               running (3m)     9s ago   3m    69.0M    4923M  17.0.0-13734-g96abb011  f4b280ba1496  6340cd1be0d5  
osd.3                vm-01               running (3m)    11s ago   3m    71.7M    6203M  17.0.0-13734-g96abb011  f4b280ba1496  ad77363d8a7f  
osd.4                vm-02               running (3m)    11s ago   3m    69.8M    5435M  17.0.0-13734-g96abb011  f4b280ba1496  9219b47952b9  
osd.5                vm-00               running (3m)     9s ago   3m    71.1M    4923M  17.0.0-13734-g96abb011  f4b280ba1496  14d5661c5bab  
osd.6                vm-02               running (2m)    11s ago   3m    68.9M    5435M  17.0.0-13734-g96abb011  f4b280ba1496  427652c89872  
osd.7                vm-02               running (2m)    11s ago   2m    70.0M    5435M  17.0.0-13734-g96abb011  f4b280ba1496  f41b405ce4ca  
osd.8                vm-01               running (3m)    11s ago   3m    69.0M    6203M  17.0.0-13734-g96abb011  f4b280ba1496  3662aca0f34a  
osd.9                vm-00               running (3m)     9s ago   3m    68.4M    4923M  17.0.0-13734-g96abb011  f4b280ba1496  a3c94dc82ebf  
osd.10               vm-01               running (3m)    11s ago   3m    68.5M    6203M  17.0.0-13734-g96abb011  f4b280ba1496  30a0e569b975  
osd.11               vm-00               running (4m)     9s ago   4m    73.4M    4923M  17.0.0-13734-g96abb011  f4b280ba1496  fc1ab3e586c3  
prometheus.vm-01     vm-01  *:9095       running (5m)    11s ago   5m    64.2M        -                          514e6a882f6e  640fcf0a4ee7  
[ceph: root@vm-00 /]# ceph health detail
HEALTH_WARN 1/3 mons down, quorum vm-00,vm-01
[WRN] MON_DOWN: 1/3 mons down, quorum vm-00,vm-01
    mon.vm-02 (rank 1) addr [v2:192.168.122.121:3300/0,v1:192.168.122.121:6789/0] is down (out of quorum)

@kamoltat is it just me getting this? When you scale down you don't get any warning?

@github-actions
Copy link

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

Motivation behind this PR is from the tracker:
https://tracker.ceph.com/issues/50089
`ceph orch apply mon <num>` command removes
the monitor from the cluster before shutting
the monitor down has created issues in the
mon/elector.cc part of ceph. The design of
the elector class assumes that we shut the monitor
down before removing them out of the cluster.
Therefore, this should be enforced everywhere
in ceph.

Correct steps to manually removing the monitor:

https://docs.ceph.com/en/latest/rados/operations/add-or-rm-mons/#removing-a-monitor-manual

In the function src/cephadm/cephadm `command_rm_daemon()`
if the daemon is a monitor, we remove it from the cluster
through calling the `CephContainer` class,
using the command: `mon remove <daemon id>`.
This is strictly after we stopped the monitor daemon using
`systemctl stop`.

In cephadm/services/cephadmservice.py,
function `pre_remove()`
we omitted the part where we remove the
monitor before destroying the daemon.

fixes: https://tracker.ceph.com/issues/50089

Signed-off-by: Kamoltat <ksirivad@redhat.com>
@kamoltat kamoltat force-pushed the wip-ksirivad-fix-ceph-orch-apply-mon branch from 347518f to bc72ae5 Compare October 6, 2022 18:37
Added a test where we have 5 MONs on
5 different hosts and try to reduce
the number of MONs from 5 to 3 using
the command ``ceph orch apply mon 3``.

Also,increasing the number of MONs from
 3 to 5 using the command:
``ceph orch apply mon 5``.

Evaluating the correctness of the commands
and inspecting if there are crashes or unhealthy
cluster states.

This test was motivated by the bug:
https://tracker.ceph.com/issues/50089

Signed-off-by: Kamoltat <ksirivad@redhat.com>
@kamoltat kamoltat force-pushed the wip-ksirivad-fix-ceph-orch-apply-mon branch from bc72ae5 to 8e0a046 Compare October 7, 2022 03:01
@github-actions
Copy link

This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days.
If you are a maintainer or core committer, please follow-up on this pull request to identify what steps should be taken by the author to move this proposed change forward.
If you are the author of this pull request, thank you for your proposed contribution. If you believe this change is still appropriate, please ensure that any feedback has been addressed and ask for a code review.

@github-actions github-actions bot added the stale label Jan 28, 2023
@kamoltat kamoltat removed the stale label Feb 21, 2023
@github-actions
Copy link

This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days.
If you are a maintainer or core committer, please follow-up on this pull request to identify what steps should be taken by the author to move this proposed change forward.
If you are the author of this pull request, thank you for your proposed contribution. If you believe this change is still appropriate, please ensure that any feedback has been addressed and ask for a code review.

@github-actions github-actions bot added the stale label Apr 22, 2023
@kamoltat kamoltat removed the stale label May 15, 2023
@github-actions
Copy link

This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days.
If you are a maintainer or core committer, please follow-up on this pull request to identify what steps should be taken by the author to move this proposed change forward.
If you are the author of this pull request, thank you for your proposed contribution. If you believe this change is still appropriate, please ensure that any feedback has been addressed and ask for a code review.

@github-actions github-actions bot added the stale label Jul 14, 2023
@kamoltat kamoltat removed the stale label Jul 14, 2023
@kamoltat
Copy link
Member Author

kamoltat commented Sep 5, 2023

(From Core Rados PR scrub)

@adk3798 So the issue is solved by this PR already: #44993

This PR will only change the mechanism of ceph orch apply mon <num>, where will shutdown the monitor before removing it. Since the PR #44993 will allow us to remove before shutdown without running into assert error, I'll leave it up to you whether you want to pursue the change or not. We can close it if you feel like the change is not worth doing.

Copy link

github-actions bot commented Nov 4, 2023

This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days.
If you are a maintainer or core committer, please follow-up on this pull request to identify what steps should be taken by the author to move this proposed change forward.
If you are the author of this pull request, thank you for your proposed contribution. If you believe this change is still appropriate, please ensure that any feedback has been addressed and ask for a code review.

@github-actions github-actions bot added the stale label Nov 4, 2023
Copy link

github-actions bot commented Dec 4, 2023

This pull request has been automatically closed because there has been no activity for 90 days. Please feel free to reopen this pull request (or open a new one) if the proposed change is still appropriate. Thank you for your contribution!

@github-actions github-actions bot closed this Dec 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants